Wanted: Floating-Point Add Round-off Error instruction
نویسندگان
چکیده
We propose a new instruction (FPADDRE) that computes the round-off error in floating-point addition. We explain how this instruction benefits high-precision arithmetic operations in applications where double precision is not sufficient. Performance estimates on Intel Haswell, Intel Skylake, and AMD Steamroller processors, as well as Intel Knights Corner co-processor, demonstrate that such an instruction would improve the latency of double-double addition by up to 55% and increase double-double addition throughput by up to 103%, with smaller, but non-negligible benefits for doubledouble multiplication. The new instruction delivers up to 2× speedups on three benchmarks that use high-precision floating-point arithmetic: double-double matrix-matrix multiplication, compensated dot product, and polynomial evaluation via the compensated Horner scheme.
منابع مشابه
Error bounds on complex floating-point multiplication with an FMA
The accuracy analysis of complex floating-point multiplication done by Brent, Percival, and Zimmermann [Math. Comp., 76:1469–1481, 2007] is extended to the case where a fused multiply-add (FMA) operation is available. Considering floating-point arithmetic with rounding to nearest and unit roundoff u, we show that their bound √ 5u on the normwise relative error |ẑ/z − 1| of a complex product z c...
متن کاملAccurate Floating-Point Summation Part II: Sign, K-Fold Faithful and Rounding to Nearest
In this Part II of this paper we first refine the analysis of error-free vector transformations presented in Part I. Based on that we present an algorithm for calculating the rounded-to-nearest result of s := ∑ pi for a given vector of floatingpoint numbers pi, as well as algorithms for directed rounding. A special algorithm for computing the sign of s is given, also working for huge dimensions...
متن کاملMore Instruction Level Parallelism Explains the Actual Efficiency of Compensated Algorithms
The compensated Horner algorithm and the Horner algorithm with double-double arithmetic improve the accuracy of polynomial evaluation in IEEE-754 floating point arithmetic. Both yield a polynomial evaluation as accurate as if it was computed with the classic Horner algorithm in twice the working precision. Both algorithms also share the same low-level computation of the floating point rounding ...
متن کاملReducing Round-off Error in an ADSL Modem with Block-Floating-Point Structure
In this paper, we report an issue first observed when using block-floating-point calculations in a time equalizer of an ADSL modem. We believe this important phenomenon have been out of the sight of researchers till now; and show that neglecting this issue may increase the round-off error significantly and reduce the SNR of a system. The simulation results suggest that ignoring this issue may i...
متن کاملWhat Every Computer Scientist Should Know About Floating-Point Arithmetic
Floating-point arithmetic is considered an esoteric subject by many people. This is rather surprising because floating-point is ubiquitous in computer systems. Almost every language has a floating-point datatype; computers from PCs to supercomputers have floating-point accelerators; most compilers will be called upon to compile floating-point algorithms from time to time; and virtually every op...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1603.00491 شماره
صفحات -
تاریخ انتشار 2016